Experiments In Constructing A Corpus Of Discourse Trees

نویسندگان

  • Daniel Marcu
  • Estibaliz Amorrortu
  • Magdalena Romera
چکیده

We discuss a tagging schema and a tagging tool for labeling the rhetorical structure of texts. We also propose a statistical method for measuring agreement of hierarchical structure annotations and we discuss its strengths and weaknesses. The statistical measure we use suggests that annotators can achieve good levels of agreement on the task of determining the high-level, rhetorical structure of texts. Our empirical experiments also suggest that building discourse parsers that incrementally derive correct rhetorical structures of unrestricted texts without applying any form of backtracking is unfea-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments in Constructing a Corpus of Discourse Trees: Problems, Annotation Choices, Issues

We present a tagging schema and a tagging tool for labeling the rhetorical structure of texts. We focus on presenting the difficulties that we faced in designing a discourse annotation manual and on discussing the choices that we made in order to address these difficulties. We report reliability results concerning our agreement on building the rhetorical structure of 90 texts of three genres: 3...

متن کامل

1 Alter 2 Loosen 3 Change Sequence 1 Alter 2 Loosen 3 Change Sequence Means 2 Loosen 3 Change Means

We present discourse annotation work aimed at constructing a parallel corpus of Rhetorical Structure trees for a collection of Japanese texts and their corresponding English translations. We discuss implications of our empirical ndings for the task of text planning in the context of implementing multilingual natural language generation systems.

متن کامل

Contrasting the Automatic Identification of Two Discourse Markers in Multiparty Dialogues

The identification of occurrences of like and well that serve as discourse markers (DMs) is a classification problem which is studied here on a corpus of dialogue transcripts with more than 4,000 occurrences of each item. Decision trees using item-specific lexical, prosodic, positional and sociolinguistic features are trained using the C4.5 method. The results demonstrate improvement over past ...

متن کامل

A Corpus-based Study of Lexical Bundles in Discussion Section of Medical Research Articles

There has been increasing interest in utilizing corpora in linguistic research and pedagogy in recent years. Rhetorical organization of different sections of research articles may appear similar in various disciplines, but close examination may show subtle differences nonetheless. One of the features that has been at the center of attention especially in recent years is the idiomaticity of a di...

متن کامل

Linguistic Devices of Identity Representation in English Political Discourse with a Focus on Personal Pronouns: Power and Solidarity

The present study was aimed at exploring the use of pronominal reference for identity representation in terms of power and solidarity in English political discourse. The investigation was based on a corpus of four political interviews and debates amounting 26,500 words. The analysis was both qualitative and quantitative. In the qualitative analysis, a discourse-analytic approach was used to fin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999